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Abstract 

Oh 

• Multivariate Poisson random variables subject to linear integer constraints arise in several 

, application areas, such as queuing and biomolecular networks. This note shows how to compute 

conditional statistics in this context, by employing WF Theory and associated algorithms. A 
symbolic computation package has been developed and is made freely available. A discussion 
of motivating biomolecular problems is also provided. 

> 

^ ; 1 Introduction 

' In application areas such as queuing and biomolecular networks, one is often interested in the 

. study of independent Poisson random variables subject to side information represented by linear 

I integer constraints. We show how to reduce the computation of conditional statistics for this 

problem to the evaluation of coefficients of generating functions. These coefficients can, in turn, 
be computed using Wilf-Zeilberger (WZ) theory. We discuss this reduction, and make available 
a symbolic computation package developed for that purpose. 
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We next provide a formulation of the problem, and briefly indicate its motivations. In Section^ 
we explain the reduction to exponential type generating functions, and in Section [3] we discuss 
the fact that recurrences can be obtained for their coefficients. Section U] discusses the special 
case of just two side constraints, which is considerably simpler. Section [5] illustrates the use of 
the symbolic package through a number of examples, all of which arise from the biomolecular 
networks discussed in Section [6l An Appendix includes a proof of the basic representation 
theorem which enables application of this techniques to certain reaction networks. 

Suppose that we have n independent Poisson random variables, Xj (j = 1 . . .n), with parame- 
ters Xj respectively. In other words 

Pr (Ai = h,X, = k^,...,X^ = K) = e-(Ai+...+A„)^^ • • • TT • (1) 



'Accompanied by Maple package MVPoisson downloadable from 
|http: //www . math . rutgers . edu/~zeilberg/mamariiii/mamarinihtml/mvp .html| 
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Suppose that wc can't observe the Xj^s directly, but only a certain number, m, of linear com- 
binations of them: 

n 

where (aij) is a certain m x n matrix with non- negative coefficients. 
We are interested in the following questions: 

1. Can one compute (fast!), for any given vector (61, ... ,6m) (possibly with large coordi- 
nates), the probability 

F{bu...,bm) :=Pr(yi = 6i,...,ym = M . 

2. Can one compute (fast!), for any given vector (61, . . . ,bm), (possibly with large coordi- 
nates) the conditional expectation 

Gj{bi,...,bm) -EiXj \Yi=bi,...,Ym = bm] , (1 < j < n). 

3. More generally, can one compute (fast!), the higher moments 

G'^;\bi,...,bm):=E[X'j \Y, = bi,...,Y^ = bm] , (r > 2) , 

that would immediately allow us to compute the moments about the mean. Can we 
compute (fast!) mixed moments, in particular the covariances? 

For example, suppose that Xi is Poisson with parameter Aj, i = 1,2, Xi and X2 are independent, 
and ^4 = (1 1). Thus, Y = Xi + X2 is Poisson with parameter Ai + A2. Fix a non-negative 
integer b. The probability that Xi = k given that Y = Xi + X2 = b is: 

(Ai+A2)-^l ^2~^ / p-(Ai+A2) (:^^i±M! 

k\ (b-ky. I b\ 

which equals 

(:)/(!-.)" 

with p = ■ It follows that (-^i|i^ = 6) is a binomial random variable B(b,p), and similarly 
(X2|y = 6) is a binomial random variable B{b, 1 —p). Statistics for binomial variables (means, 

variances, and all moments) are well-known and easy to compute. On the other hand, for more 
complicated linear constraints, and especially if more than one such constraint is imposed, 
statistics become considerably harder to obtain. 

A simple example of where this type of problem might arise is as follows. Suppose that the 
random variables X^ count the number of calls placed, during a typical time period, to an 
international service center and originating from a specific country or geographical area and in 
a specific customer language. For example, Xi may represent the number of English-speaking 
callers from the USA, X2 the number of Spanish-speaking callers from the USA, X^ the number 
of English-speaking callers from Latin America, X4 the number of Spanish-speaking callers from 
Latin America, X^ the number of English-speaking callers from the UK, and Xq the number of 
Spanish-speaking callers from the UK. It is natural to assume that each of the random variables 
is Poisson-distributed. Now, suppose that we want to know what are the statistics of the variable 
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Xi, for example, the variance in the number of Enghsh-speaking callers from the USA, subject 
to the additional information that the total number of Spanish-language calls received was 100 
and that the number of calls received from the US was 50. This is E[Xi \Yi = 100, 12 = 50] with 
Yi = X2 + X4 + Xq and I2 = Xi + X2. (More interestingly, one might have mixed information, 
represented by more general linear combinations.) We were originally motivated in this work 
by applications in molecular biology; we defer to Section [6] a detailed discussion and examples. 



2 The generating function 

Fix a matrix A = {a-ij) (1 < i < m, 1 < j <n), once and for all. Let 

fcl,...,fc„>0 1 z n 

aiiki + . . . + ai^kn=bi . . ,a^lki + . . . + amnhn—bm 

(value is zero if the sum is empty). Thus, our focus will be on computing Fq, from which we 
can easily obtain F, since 

F{bi, ...,h^) = e-(^i+-+^")Fo(6i, ...,h^). 

Let /o be the (multivariable) generating function of Fq, in other words 

/o(zi,...,Zm) = Fo{bi,...,bra)zl' ...Z^^ . 

bi>0,...,bm>0 

Our quantity of interest, Fo(6i, . . . , 6^), is the coefficient of . . . in the multivariable 
Taylor expansion about the origin of /o(zi, . . . , Zm,)- 

We have: 



1 ^2 I bi b„ 



kl\ k2\"' knl ^ 

bi>0,...,bm>0 \ fci,...,fe„>0 ^ ^ " ' 



By changing the order of summation, this equals 

kil k2\'" knl ' 



IT 

fci>0,...,A;„>0 



2 „ai\ki+...+ainkn ^amlki+...+amnkn 



^ kl\ fcj 

fci>0,...,fc„>0 

(AiZ^^^Z^^i _ _ _ z^™i)'=i \ / ^ (An^"!"^^" . . . z'^^f^ 



^fci>0 / \fc„>0 

exp^AiZ^ ^2 . . . j . . . exp^AnZ]^ • • • y 

cxp \^Ai^-j^ ^2 • • • ~r . • • ~r A^^-j^ ^2 . . . 



We have just derived 
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Theorem 1: 

(n m 
j=i i=i 

The conditional probability 

Pr {Xi = ki, X2 = k2, . . . , Xn = kn \ Yi = 61, . . . , = bm) 

is the same as the expression in ([T]) divided by F{b), provided that Yl^=i'^ij^j ~ ^« 
and is zero otherwise. Recall that the rth factorial moment of a random variable W, E[W^^^], 
is, by definition, the expectation of W\/{W — r)!. We are interested in the conditional factorial 
moments of Xj given Y = b, which we will denote as E[Xj^^ | Y]. By definition, E[Xj^^ | Y] is 
the following expression divided by Fo(6): 

ki,...,kn>0 

a-llki+. . . + a-i^kn=bi , . . . ,a^iki + . . . + amnkn=bm 

Now, expression ([2]) is the same as the result of applying the operator A^(^j-)^ to Fo{bi, . . . , bm) 
when viewing the A 's as variables and not as constants. On the other hand. 
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bi>0,...,b,n>0 



and therefore expression ([2]) is the same as the coefficient of . . . in Xji-^Y foizi, . . . ,Zm) 
Since, as formal power series, we have the representation in Theorem 1. we conclude that 
expression ([2]) is the same as the coefficient of zj^ . . . in (ni^i z^^'^Y fo{z), which is the same 
asF{bi—raij,b2 — ra2j,...,bm — ramj) when all 6j — rajj > and zero otherwise. In conclusion, 
E[Xj^^ I Y] equals -Fo(&i — raij, 62 — '''a2j, . . . ,bm — ramj) divided by i*b(&)- We have proved: 

Theorem 2: The conditional factorial moments ElXj"^ | Y] are given in terms of the Foib^^ • • • ? ^m) 
by 

Fo(6i - raij,b2 - ra2j, ...,bm- ramj) 

Fo{bi,...,bm) 

when all bi — raij > and zero otherwise. 

So everything depends on a fast computation of the coefficients Fo{bi, . . . , bm), of /o(-Zi, • • • , -Zm)- 

By taking mixed partial derivatives, we can easily derive analogous expressions for mixed mo- 
ments, in particular, the covariances. 



3 Recurrences 

From now on, let's assume that the entries of A, (aij), are non-negative integers. In that case, 
we can write 

/o(z) = exp(Q(zi, . . . , 
where Q{zi, . . . , Zm) is the polynomial 

n m 

Q{zi,...,Zm) ■.= YXjYlzi'' . 

j=i i=i 
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By Cauchy's theorem, we can express F{bi, . . . , hm) as a multi-contour integral: 

F{bi,...,bm)= { — ] / ... / r--^ . ,1 dzi...dZm 



.^'"-y J\zi\=c J\zm\=c Z^ ■ ■ ■ Zm 

By the celebrated Wilf-Zeilberger theory ([16j), F{bi,...,bm) satisfies pure linear recur- 
rences with polynomial coefficients in each of its arguments. This means that for each i 
between 1 and m, there exists a positive integer Ri (the order) and polynomials Pr^\bi, . . . , b.^) 
(0 < r < Ri) such that the following holds, for all . . . , bm)'- 

Ri 

P^^{bi, bm)F{bi, . . . , bi + r, k+i, . . . , 6„) = . 

r=0 

Once these recurrences are known, one can compute F{bi, . . . , bm) in time linear in 6i + . . . + 6m 
and with constant memory allocation (one only needs to remember, at each stage, a constant 
number of values). 

In rare cases, the leading term of the recurrence would vanish, in which case, we would encounter 
a (discrete) "singularity", and would not be able to go on, since we would have to divide by 0, 
but in that case one can show that there is an alternative route, using another order of applying 
the recurrences. 

The Apagodu-Zeilberger[3] multi-variable extension of the Almkvist-Zeilberger|JLj algorithm 
can find such recurrences explicitly. Unfortunately, for matrices A with more than three rows, 
the time taken to find such recurrences is prohibitive, but many matrices of interest have two 
or three rows. 



4 Two-Rowed matrices 

If the matrix A only has two rows, and its entries are only {0, 1}, then one can express -F(6i, 62) 
as a single sum. Indeed, let 

• coi be the sum of the Aj's for which aij = 0, 02 j = 1, 

• coi be the sum of the Aj's for which aij = 1, 02j = 0, 

• coi be the sum of the Aj's for which aij = 1, 02 j = 1. 



Then, we have 
and so 



Q{z) = coizi + C10Z2 + C11Z1Z2, 

k=0 

E (coi^l)°(cioZ2)^(cil^l2:2)'^ 

">0,/3>0,7>0 ^ ' 

J3 „7 

E ^01 "^10 ^11^1 ^2 

a>0,/3>0,7>0 ^ ' 
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To get Fo(6i, 62)) we must extract the coefficient of ^^^2 which entails a = 61 — 7, /3 = 62 — 7, 
and we have the single-sum binomial coefficient (hypergeometric) sum (replacing 7 by fc) 

min(bi,62) j, bi-kM-k 



Fo{hiM)= 



Using the Zeilberger Algorithm ([171 , we get the following linear recurrence: 
(ciobj + 4cio - 2C1062 + 4cio6i - cio&i62)i^o(&i + 2, ^2)+ 

(-Cii6i-Cii+62Cll+62CloCoi-26lCioCoi-3coiCio)Fo(^l + l,&2) + (cilCio+CoiCio)i^o(&l,fe2) = . 



5 The Maple package MVPoisson 

All this is implemented in the Maple package MVPoisson accompanying this article. It is 
available from the webpage of this article 

http : //www . math . rutgers . edu/~zeilberg/mainariiii/mamariinhtml/mvp . html , 

where one can also find sample input and output. 

We next discuss several examples of matrices A and computations using MVPoisson. These 
examples, of interest in themselves, are motivated by the biochemical networks discussed in 
Section [H 

Mostly, we illustrate the use of the command "RecsV" , which provides the recurrences satisfied 
by the coefficients Fq, but we also show a few examples of other commands that compute 
moments. 



5.1 A one-row example 

The matrix A is: 

^=(11). (3) 

As discussed in the Introduction, the conditional random variables (Aj|y = b) are binomial. 
With the notations of this paper, 

i+j=b 

This function Fq clearly satisfies the following recurrence: 

Foib + l) = ^^^0(6) 

with Fo(0) = and F(l) = Ai + A2. Indeed, for the matrix A in the "RecsV(A, A, 6)" 
command provides the following recurrence: 

Fo(6i + l) = Al + ^i.o(6i) 
i + Ol 

with initial condition Fq{1) = Ai + A2. 
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5.2 A two-row example 

The matrix A is: 



For the matrix A in ([3]), the "RecsV(A, A, 6)" command provides the fohowing two-dimensional 
recurrence: 

^0(61 + 2,62) = . , , ^ Fo{bi + l,b2) 



on bi and 



i^o(^i,^2 + 2) = Fo(6i,62 + l) 

Ai(b2 + ^) 



on 62) with the following initial conditions: 

Fo(l,2) Fo(2,2) 
Fo(l,l) i^o(2,l) 



A3 + A1A2 A2A3 + ^AgAi 

A1A3 -I- ^A2Af ^Ag + A2A1A3 + jA^Af 

5.3 Another two-row example 

The matrix A is: 

For the matrix ^4 in ([5]), the "RecsV(A, A, 6)" command provides the following two-dimensional 
recurrence: 



F„(6i + 3,6.2) = J^/'C'^+^'I'-') 



2A2(3 + 6i)(2 + 6i) 
+ 2A2(3+\k2 + 6i)^°^^^'^'^ 



on bi and 



Ai(0 -h 02j O2 + o 
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on 62 with the initial conditions: 



See Figure m 



Fo(l,3) Fo(2,3) Fo(3,3) 
Fo(l,2) Fo(2,2) Fo(3,2) 
Fo(l,l) Fo(2,l) Fo(3,l) 



2 O 



A3 A1A2 A2A3 

A1A3 ^A| + ^X2^l A2A1A3 
2-^1-^3 2'^1'^3 + 6^2Ai gAg + 2A2Af A3 



4 O O O O 



O O 



O O 



a- 



Figure 1: Two-dimensional recursion fills- in the values of Fo{i,j) at the locations indicated 
by the open circles, using the initial data given at the locations indicated by the filled circles. 
For programming convenience, indices are positive integers: in the example shown, the initial 
conditions are specified for i,j = 1, 2, 3. 



5.4 A two-row example with five columns 

The matrix A is: 

. 1 1 1 \ 

^ = ( 1 1 1 1 j • 

For the matrix A in ([6l), the "RecsV(^, A, 6)" command provides the following two-dimensional 
recurrence: 

77 /L , n U \ (-&2A5 - 62A4 + A5 + A4 - A1A3 - A2A3 + 61 A5 + 61 A4) „ , , , ^ 

+ = (Ai + A2)(2 + M Foih + IM) 

A3(A5 + A4) p 



on 61 and 



, , „N (-A5 - A4 + 61 A5 + 61 A4 - 62A5 - 62A4 + A2A3 + A1A3) 
Fo(6i,62 + 2) = Fo(6i,62 + l) 

(A5 + A4)(Ai + A2) 
+ A3(62 + 2) "^^^^^'^^^ 
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on 621 with the initial conditions: 



Fo(l,2) Fo(2,2) 
Fo(l,l) Fo(2,l) 

A5 + A4 + (Ai + A2)A3 (As + A4)(Ai + A2) + ^(Ai + X2fX3 

A3(A5 + A4) + i(Ai + A2)Ai i(A5 + A4)2 + (Ai + A2)A3(A5 + A4) + \{Xi + X2?Xl 

The command "CorMf(A, A, 6)" provides the correlation matrix for the Xj's subject to Ax = b 
and assuming that the parameters are A. For the matrix A considered here we obtain, for 
example with A = (1, 1, 1, 1, 1) and b = (5, 5), the following result: 

/ 1.0 -.3647053019 .5636021195 -.2407443460 -.2407443460 \ 

-.3647053019 1.0 .5636021195 -.2407443460 -.2407443460 

.5636021195 .5636021195 1.0 -.4271530174 -.4271530174 

-.2407443460 -.2407443460 -.4271530174 1.0 -.6350805992 

V -.2407443460 -.2407443460 -.4271530174 -.6350805992 1.0 / 

Note the negative entry for the correlations between Xi and Al2. This corresponds to the fact 
that Y2 = Xi + X2 + X4 + X5 = 5, so increases in Xi should be expected to result in decreases 
in X2. Similar interpretations apply to the other entries. 

5.5 A two-row example with six columns 

The matrix A is: 

^ _ f 1 1 1 1 \ 

^ - V 1 1 1 1 j ■ 

For the matrix A in ([7]), the "RecsV(yl, A, 6)" command provides the following two-dimensional 
recurrence: 

/, , r, , N A3 -f- Ae — A5A4 — A5A2 -I- 61 A3 + 61 Ae — A1A4 — A1A2 — 62A3 — 62A6 , 1 i, ^ 
+ = (A4 + A2)(2 + M F,ib, + l,b2) 

(A3 + A6)(A5 + Ai) 
+ (A4 + A2)(2 + m'^^^'^'^^^ 

on 61, and 

/, , , r,\ ^i^e + ^^^3 ~ ^2X5 — 62A3 -I- A1A4 -I- A5A4 -I- A1A2 -I- A5A2 — Ae — A3 

^°('^''^ + ') = (62 + 2)(A5 + A0 Fo{b„b2 + l) 

+ (A3 + A6)(A4 + A2)/(62 + 2)(A5 + Ai)Fo(6i, 62) 

on 62) with the initial conditions: 

Fo(l,2) = A3 + A6 + (A4 + A2)(A5 + Ai) 

Fo(2,2) = (A3 + A6)(A4 + A2) + ^(A4 + A2)2(A5 + Ai) 

Fo(l,l) = (A5 + Ai)(A3 + A6) + ^(A4 + A2)(A5 + Ai)2 

Fo(2,l) = i(A3 + A6)2 + (A4 + A2)(A5 + Ai)(A3 + A6) + ^(A4 + A2)='(A5 + Ai)2. 
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5.6 A three-row example 



The matrix A is: 

/ 1 1 
^ = 1 1 I . (8) 
\ 1 1 1 1 

For the matrix A in ([8|), the "RecsV(^, A, 6)" command provides the fohowing two-dimensional 
recurrence: 

r^/, , o I, \ —t>2^e — &2A3 — A1A2 — A1A4 — A5A2 — A5A4 + 61 Ae + foiAs + Ae + A3 , , 1 , n 

+ = (A2 + A4)(2 + M m + 

(A6 + A3)(Ai + A5) 
+ (A2 + A4)(2 + M 

on 61, and 

r^/, 7 , r,\ ^1-^3 + feiAe — A3 — Ag + A1A2 + A5A2 + A1A4 + A5A4 — 62A3 — 62A6 , , i\ 
^^'^''^^'^ = (^2 + 2)(A, + A5) Fib,M + l) 

, (A6 + A3)(A2 + A4) 

+ (62 + 2)(Ai + A5) "^^'^''^^ 
on 62, with the initial conditions: 

Fo(l,2) = A6 + A3 + (A2 + A4)(Ai + A5) 

Fo(2,2) = (A6 + A3)(A2 + A4) + (l/2)(A2 + A4)'(Ai + A5) 

Fo(l,l) = (Ai + A5)(A6 + A3) + (l/2(A2 + A4))(Ai + A5)' 

Fo(2, 1) = (1/2)(A6 + A3)' + (A2 + A4)(Ai + A5)(A6 + A3) + (1/4)(A2 + A4)'(Ai + A5)' . 



5.7 A four-row example 

The A matrix is: 

/ 1 1 1 \ 

1 1 1 , . 

"OOlOOllO" ^ ^ 

V 1 1 1 / 

For 4-row matrices as this one, the package MVPoisson is not able to return recurrences in 
a reasonable amount of time. However, one can now use the generating functions directly to 
compute the relevant quantities of interest, except that it is no longer possible to treat large 
inputs. 

The command "SipurD" is used to generate averages and variances ( "SipurD2f " implements a 
more efficient algorithm specifically for matrices with two rows). For the matrix j4 in ([9]) and, for 
example, A = (1, 1, 1, 1, 1, 1, 1, 1) we obtain that E[Xi\Y = b] 1.897 when b = (10, 10, 10, 10) 
and ~ 2.813 when b = (20,20,20,20) (the value may be obtained to arbitrary precision), and 
that the variance of Xi conditioned on this same 6 is ~ 1.112 when b = (10,10,10,10) and 
~ 1.379 when b = (20,20,20,20). The program also guesses asymptotic formulas for these 
quantities as a function of the entries of b, and as such is a useful tool in research, suggesting 
possible general formulas that one could attempt to prove. 
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6 Biochemical applications 



We now explain how the problem studied here arises in the context of systems described by 
chemical network theory, and in particular chemical kinetics. There are two fundamentally 
different ways to mathematically model chemical reactions. One of them is based on differential 
equations modeling, and the other one on stochastic models. Our problem arises from this 
second approach. However, to understand its interest, it is important to first discuss the 
differential equation case. Differential equation models are useful when the number of molecules 
is very large, so that a continuous approximation is appropriate. 

Suppose that n "species" interact through a network of reactions. The term species is used 
to refer to the elementary objects participating in the interactions: in molecular biology, these 
are typically ions, atoms, or molecules; in population biology and ecology, they may represent 
distinct animal or plant populations, particular age groups, and so forth. It is natural to describe 
such a network by a system of n differential equations which constrains the time evolution of 
the concentrations (or average populations) of the various species. These sets of differential 
equations take the following general form: 

X = TR{x) 

(dot indicates time derivative) where x = x{t) is an n-vector of species concentrations (non- 
negative real numbers) and F is an n x m matrix, called the "stoichiometry" matrix, whose 
columns describe how many units of each species are created or destroyed by each of m possible 
reactions. The components of the m-vector R{x) quantify the reaction rates for each of the m 
reactions, as a function of the current concentrations as well as parameters (reaction constants) 
that reflect physical and chemical information. 

Chemical reactions are often described by graphs whose nodes are the "complexes" (the species, 
or combinations of species, that participate in the reactions) and whose edges are labeled by 
reaction rate information. Often, a mass-action kinetics model is used, which means that the 
reaction rate is proportional to the product of the concentrations of the reactants, and only the 
proportionality constant, called the kinetic constant associated to the corresponding reaction, 
is displayed on an edge. There is a systematic and simple way to map graph descriptions to 
differential equations. 

Some of the main results in chemical network theory were obtained by Horn, Jackson, and 
Feinberg (see [71 [8] and also [H] for an exposition using a somewhat different formalism). 
These results guarantee that solutions of the system of differential equations are well-behaved 
(stability of equilibria, uniqueness of equilibria modulo stoichiometric constraints), provided 
that certain structural properties are satisfied by the network. The main such theorem is valid 
for what are called complex balanced networks. A sufficient (though not necessary) condition 
for complex balancing is that the network be "weakly reversible" and have "deficiency zero". 
The deficiency is computed as c — i — r, where c is the number of complexes, r is the rank of 
the matrix F, and i is the number of "linkage classes" (connected components of the reaction 
graph). Weak reversibility means that each connected component of the reaction graph must 
be strongly connected. We refer the reader to the citations for details on deficiency theory. Our 
examples are all complex balanced. 

When the numbers of molecules are very small, as is sometimes the case in molecular biology, 
a discrete stochastic model may be more appropriate than a continuous differential equation 
model. Indeed, fluctuations cannot be ignored when dealing with genes (usually one or two 
copies), mRNA's (in the tens), ribosomes and RNA polymerases (up to hundreds) or certain 
proteins that have low concentrations. 
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Stochastic models fully account for the probabilistic nature of reactions. The number of in- 
dividual copies of each species at (continuous) time t is viewed as a random process Xi{t), 
i = l,...,n. The Chemical Master Equation (CME), which is the differential form of the 
Chapman-Kolmogorov forward equation, is a linear first-order differential equation that de- 
scribes the time evolution of the joint probability distribution of the Xj(t)'s. Often, the interest 
is in long-time behavior, after a transient, that is to say in the probabilistic steady state of 
the system: the joint distribution of the random variables Xi = Xj(oo) that result in the limit 
as t ^ oo (provided that such a limit exists in an appropriate technical sense). This joint 
distribution is a solution of the steady state CME (ssCME), the infinite set of linear equations 
obtained by setting the right-hand side of the CME to zero. 

A very beautiful recent observation, made in ^ (basically a rewording of classical results in 
queuing theory in Chapter 8 of pT] , see also [12] for a discussion) is that the complex balancing 
condition, introduced originally for deterministic differential equation models, also guarantees 
that there is a solution vr of the ssCME that is formally the joint distribution of n (the number 
of species) independent Poisson random variables. More precisely, for each deterministic steady 
state X G M>q (that is, TR{x) = 0, in other words, a zero of the vector field TR(x)), the vector 
vr defined as follows is a solution of the ssCME. The vector tt is indexed by the n-dimensional 
lattice of non-negative integers, N = {Ni, . . . , Nn) G ^>o- We write the A^th entry of vr as P{N) 
(thought of as a probability, in steady state, of the event {Xi,X2, . . . , Xn) = {Ni, . . . , Nn)) Let 
us write the product and A^i! . . . A^! as "A!". Then, the assertion is that 

the vector tt whose components are 

(as well as any scalar multiple of this vector) is a solution of the ssCME. We provide a self- 
contained proof of this fact in an Appendix to this paper. 

However, the existence of this product form distribution does not mean that the joint distribu- 
tion of the variables Xi will be independent Poisson, because the solution of the ssCME is not, 
in general unique. The lack of uniqueness stems from conservation laws. Because of possible 
conservation laws, things are a bit subtle. 

As an example, suppose that two molecules of species A and B can reversibly combine through 
a bimolecular reaction to produce a molecule of species C: A -\- B ^ C . Let us denote the 
number of molecules of species A, B, and C at time t by Xi{t), i = 1,2,3 respectively. The 
count of A molecules goes down by one every time that a reaction takes place, at which time 
the count of C molecules goes up by one. Thus, the sum of the number of A molecules plus the 
number of C molecules remains constant: Xi{t) + X^{t) = hi. Similarly, X2{t) + X^[t) = b2, 
because the total count of B and C molecules is also constant. This holds for all t, so taking 
limits as t ^ oo (ignoring technicalities!), we have that, for the steady state random variables, 
still Xi + A3 = bi and X2 + A3 = 62- Let us introduce Yi = Xi + A3 and Y2 = X2 + A3. Thus, 
depending on the initial conditions hi = Ai(0) + A3(0) and 62 = A2(0) -|- A3(0), the limiting 
distribution will be that of Ai and A2 conditioned on Yi = bi and I2 = &2- Once we collect 
this information into a matrix A, in this case 



we are back to the situation where we want to study the behavior of the conditioned variables 
Xi\Yj, where the Aj's are Poisson distributedQ 

*Our discussion is incomplete from a probabilistic viewpoint, as we have not addressed questions of uniqueness 
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The rest of this section discusses various examples. To make the notations compatible with 
usage in probability theory, we use "A" for the Poisson rates (instead of Xi) and k for multi- 
indices (instead of A'') . 

6.1 A simple reversible reaction 

Consider the following reaction: 

Xi ^ X2 (10) 
in which one molecule of substance Xi reversibly transforms to X2 ■ 

This reaction system is complex-balanced, because it is weakly reversible and it has 2 complexes, 
1 strongly connected component, and rank 1, and hence deficiency zero. 

The steady states of this reaction network are given by the solutions A = (Ai, A2) of the equation 
kiXi = k2X2- We may pick, for example, A = {l,ki/k2). 

Every time that the forward reaction takes place, the count of molecules of Xi gets decreased 
by one and the count of molecules of X2 gets increased by one; the converse happens for the 
backward reaction. Thus, the total number of molecules of Xi and X2 remains constant. The 
corresponding A matrix is given in 

6.2 A bimolecular reaction 

Consider the following reaction: 

X,+X2 ^ Xs (11) 

in which one molecule of Xi combines reversibly with one molecule of X2 in order to produce 
one molecule of X^. 

This reaction system is complex-balanced, because it is weakly reversible and it has 2 complexes, 
1 strongly connected component, and rank 1, and hence deficiency zero. 

The steady states of this reaction network are given by the solutions A = (Ai,A2,A3) of the 
equation 

kiXiX2 = k2X3 . 
We may pick, for example, A = (1, 1, ki/k2). 

Every time that the forward reaction takes place, the counts of molecules of Xi and X2 each 
gets decreased by one and the count of molecules of X^ gets increased by one; the converse 
happens for the backward reaction. Thus, the total number of molecules of Xi and X3 remains 
constant, as does the total number of molecules of X2 and X3. The matrix A is as in ([4]). 

6.3 A more interesting bimolecular reaction 

Consider the following reaction: 

2X1 +X2 ^ 2Xs (12) 

fc2 

and convergence. These questions require a careful study of irreducibility properties of the associated Markov 
chains. We are only interested here in the computational problem of obtaining statistics for the conditioned 
variables. 
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which may represent, when Xi = H2, X2 = O2, and X3 = H2O, the reversible creation of a 
molecule of water, when two molecules of the diatomic hydrogen gas combine with one molecule 
of the diatomic oxygen gas to produce two molecules of water. (The forward reaction produces 
energy, and the reverse reaction, breaking water to form hydrogen and oxygen, requires energy, 
for instance through electrolysis. The chemical reaction formalism used here does not account 
for energy production or consumption.) 

This reaction system is complex-balanced, because it is weakly reversible and it has 2 complexes, 
1 strongly connected component, and rank 1, and hence deficiency zero. 

The steady states of this reaction network are given by the solutions A = (Ai,A2,A3) of the 
equation 

/C1A1A2 = k2Xl . 
We may pick, for example, A = (1, 1, \Jk\lk2)- 

The total sum of hydrogen and water molecules remains constant, and for each two molecules 
of oxygen there is one of water produced and viceversa. The matrix yl is as in ([5]). 



6.4 A receptor-ligand model 

Receptor-ligand interactions play an important role in the understanding of the biochemical 
mechanisms that initiate cellular signaling, and their study is central to pharmacology. A 
"two-state" model for such interactions studied in [5] is shown, pictorially, in Figure [2l 



^31 



i?2 + ^ 



k2\ 



^34 



^43 



H2 



^2A 



C2 



Figure 2: A two-state receptor-ligand network 

The species participating in this reaction are: i?i and i?2, which represent the free receptors in 
an inactive and active conformational state respectively, the free ligand L, and the respective 
receptor-ligand complexes Ci = RiL and C2 = R2L. 

The steady-states A = (Ai, A2, A3, A4, A5) = {Ri, R2, L,Ci,C2) of this system must satisfy the 
following polynomial equations: 

-{k2i + k3i)RiL + ki2Ci + ki3R2L = 

-{ki3 + k43)R2L + ksiRiL + k34C2 = 

-k2iRiL - ki3R2L + ki2Ci + /C34C2 = 

-{ki2 + k4:2)Ci + k2iRiL + k2iC2 = 

-(A;34 + fc24)C2 + A:42Ci + /c43i?2^ = 

For example, when all kinetic constants are ki = 1 (this is not a realistic biological choice of 
constants, but is picked simply for illustration), then A = (1, 1, 1, 1, 1) is a steady-state. 
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This reaction system is complex-balanced, because it is weakly reversible and it has 4 complexes, 
1 strongly connected component, and rank 3, and hence deficiency zero. 

The conservation of L + C1 + C2 (total amount of ligand) and Ri + R2 + Ci + C2 (total amount 
of receptors) leads to the matrix in ([6]). 

6.5 A two-component signaling system in bacteria 

The next example is from [4J. It models the "EnvZ/OmpR system" in E.coli bacteria. This 
system regulates the production of certain transport proteins (porins OmpF and OmpC) which 
act as pores allowing molecules to diffuse through the cell membrane. The system includes 
a kinase, EnvZ, which phosphorylates and dephosphorylates the response regulator OmpR, 
and is a particularly well-studied "two-component signaling system" in bacteria. The model is 
shown, pictorially, in Figure [3l where, for simplicity, we omit labeling each arrow by a reaction 
constant. We are using the following short-hand notations for the respective notations in [3]: 



Xi=R = OmpR, X2 = ZP = EnvZ-P (phosphorylated form), X3 = ERP = (EnvZ-P)OmpR 
(complex), X4 = Z = EnvZ, = RP = OmpR-P (phosphorylated form), and Xq = EPR = 
(EnvZ)OmpR-P (complex). 

This reaction system is complex-balanced, because it is weakly reversible and it has 5 complexes, 
1 strongly connected component, and rank 4, and hence deficiency zero. 

With all reaction constants equal to one, A = (1, 1, 1, 1, 1, 1) is a steady state. 

The system is described by six differential equations, subject to two constraints. These con- 
straints reflect that the total amount of each of OmpR and EnvZ should stay constant, respec- 
tively, and give the rows of the A matrix for this example as that shown in ([7]). 

6.6 A receptor antagonist model 

The paper [10] analyzes a model involving the cytokine Interleukin-1 (IL-1), which is produced 
in response to inflammatory stimuli. The species in the model are IL-1 (denoted as L for 
"ligand"), the IL-1 receptor (denoted by R), the human IL-1 receptor antagonist (denoted by 
A), a decoy receptor or "trap" (denoted by T) which, by binding to the ligand, helps block 
IL-1 signaling, and the four possible dimers RL, RA, AT, and LT. The model consists of four 



R + ZP 



EPR 




RP + Z 



Figure 3: A two-component signaling system 
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reversible reactions: 



R + L 








it + A 








A + T 








L + T 









RL 
RA 
AT 
LT 

This reaction system is complex-balanced, because it is weakly reversible and it has 8 complexes, 
4 strongly connected components, and rank 4, and hence deficiency zero. 

The total amounts of R, L, A, and T are conserved, giving rise to a matrix A with 4 rows. 
Ordering the states as follows: R, L, A, T, RL, RA, AT, LT, the resulting matrix is as in Q. 



6.7 A futile-cycle example 

We now describe an example that motivates looking at a matrix ^ as in ([8]). In contrast to 
the previous examples, however, this one is not complex-balanced and thus does not fit the 
assumptions for the ssCME having a solution in product form. So the interest in the condi- 
tional statistics problem for Poisson variables is purely academic for this particular example. 
Nonetheless, it is worth seeing how such a matrix A arises. 

"Futile cycles" involving phosphorylation and dephosphorylation are ubiquitous in molecular 
biology (see for example [15] for more discussion and references). In such processes, an enzyme 
E (a kinase) catalizes the transformation of a substrate S into a product P, passing through 
one or more intermediate complexes C. A different enzyme F (a phosphotase) catalizes the 
transformation of P back into S, also passing through one or mode intermediate complexes. 
The simplest model (just one intermediate) for such a reaction is as follows: 

fci fca 
E + S ^ C ^ E + P 

ks fcy 

F + P ^ D ^ F + S 

ki kg 

in which we used C and D to denote the intermediate complexes. (Usually, the backward 
reactions to complex dissociation, labeled by /c4 and fcg, are not included in the model, since 
they are energetically very unfavorable.) This system has deficiency one (6 complexes, two 
classes, and rank 3). Thus, the basic deficiency zero theory does not apply. Interestingly, 
however, a variation, "deficiency one theory" , can be used to predict the existance of multiple 
steady states for this system; see [6]. 

There are three conservation laws, corresponding to the conservation of total kinase, phospho- 
tase, and substrate/product. Ordering the variables as S, P, E, F,C, D, we obtain the matrix 
A as in ([8]). 



Appendix 

For completeness, we show here that complex balanced reactions admit product form equilib- 
rium densities for their Chemical Master Equations. The proof is basically that in [2l 111^112]. 
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Setup 



A chemical reaction network is specified by: 
TZ = {1, . . . , m}, the set of reactions. 
C C M"q, a finite set of complexes. 

Example: if there are two reactions 1: A + B^C + D and 2 : 2 A + C ^ B, then the set C will 
have four elements, listing the species participating in each: (1,1,0,0), (0,0,1,1), (2,0,1,0), 
(0,1,0,0). 

S", T : M ^ C are the source and target functions that describe which are the reactant and 
product complexes, respectively. 

Example: with the above reactions, S{1) = (1,1,0,0), r(l) = (0,0,1,1), 5(2) = (2,0,1,0), 
r(2) = (0,1,0,0). 

We make the following notational convention: for vectors x, c G 1^>0) '■— ^'i ■ ■ ■^'n (with 
O'^ = 1), and for nonnegative integer vectors = {Ni, . . . , A„), A^! := Ai! . . . A„!. 

By definition, a vector vr = {P{N),N G Z>o) is a steady-state solution of the Chemical Master 
Equation associated to a given reaction network if it satisfies: 

P{N - m + S{i)) A{N - T(i) + S{i)) = PiN) A{N) (13) 

for each A E Z>oi where Ai{N) is the ith "propensity function" [9]: Ai{N)dt is the probability 
that reaction i will occur in a small time interval [t,t + dt] if the state of the system is A^ at time 
t. This function is proportional to the number of ways in which the A^ molecules can combine 
to form the ith complex: 

A! 

Ai(N) = ki- — — . 

^ ^ (A-5(i))! 

The constant ki is related to the deterministic kinetic constant of the respective reaction through 
division by a power of the volume in which the reaction takes place. 

A complex balanced steady state (CBSS) with respect to the given network and kinetic constants 
/c is an X G M!^g (which is thought of as a vector of species concentrations) such that the following 
property holds for each complex c £ C: 

Y hx'^^ = Y ^i^'^"^ (14) 
ieT-i(c) je5-i(c) 

(note that one can equally well write "x'^" and bring this term outside of the sum, in the 
right-hand side). 

Complex balancing means that each "complex" is balanced in inflow and outflow. This is a 
Kirschoff current law (in-flux = out-flux, at each node) when one writes a chemical network. 

A counter-example to complex-balancing is this reaction network: 

A^B, 2B %2A 

(or, if one prefers reversible reactions, one may take instead an example due to Wegsheider, 
A <^ B and 2A <-> B). In steady state, kia — 2k2b'^ = 0. But complex-balancing would require 
that the outflow of "^" be zero (since there are no inflows into the "complex" A), which means 
kia = 0, and misses the nonzero steady states. (One could also argue with the complex 2A, or 
with B, or with 2B.) 
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A complex-balanced system is one with the property that every steady state is complex balanced. 
This concept was studied in detail by Horn and Jackson and by Feinberg in the early 1970s. 
Feinberg [7, 8J showed that for a special type of system (weakly reversible and deficiency zero), 
for any kinetic constants there is a steady state x G M"o satisfying (jl4p (and, in fact, every 
other steady state will also be complex balanced). Let us call a system with these properties a 
"Feinb erg-like system" . 

The Key Lemma 

Suppose that the reaction network is a Feinberg-like system. Let x satisfy where the k 's 

are the proportionality factors in Take any function a : C ^ M on complexes. Then: 



J]]A:,x^»-^»Q(r(i)) = Y,''MS{i)). (15) 

Proof. Since 

E = E E E = E E 

ien ceCieT-'^{c) i&n ceCies-^{c) 

it is enough to show that, for each fixed c: 

hx^^^--a{T{i)) = hoi{S{i)). 

i€T-^{,c) je5-i(c) 

Since T{i) = c and S{i) = c in the left-hand side and right-hand side respectively, this is the 
same as the CBSS condition upon multiplication by x~'^a{c). I 

Corollary. The vector 11 with 

P(N) = — 

is a steady state solution of the CME. 

Proof. Obvious using a(c) = j^z^y - ■ 
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